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Abstract. We propose a new approach to heap analysis through an 

^ ■ abstract domain of automata, called automatic shapes. The abstract do- 

^4^' main uses a particular kind of automata, called quantified data automata 

■^1>( , on skinny trees (QSDAs), that allows to define universally quantified 

*vj ' properties of singly-linked lists. To ensure convergence of the abstract 

fixed-point computation, we introduce a sub-class of QSDAs called elas- 
tic QSDAs, which also form an abstract domain. We evaluate our ap- 
|_^ ' proach on several list manipulating programs and we show that the pro- 

Qi^ \ posed domain is powerful enough to prove a large class of these programs 

correct. 
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1 Introduction 



f~^ , The abstract analysis of heap structures is an important problem in program 

\^ ' verification as dynamically evolving heap is ubiquitous in modern programming, 

^O , either in terms of low level pointer manipulation or in object-oriented program- 

^^ ' ming. Abstract analysis of the heap is hard because abstractions need to repre- 

■"sj" , sent the heap that is of unbounded size, and must capture both the structure of 

C3 ■ the heap as well as the unbounded data stored in the heap. While several data- 

domains have been investigated for data stored in static variables, the analysis 
of unbounded structure and unbounded data that a heap contains has been 
less satisfactory. The primary abstraction that has been investigated is the rich 
work on shape analysis |29| . However, unlike abstractions for data-domains (like 
5^ , intervals, octagons, polyhedra, etc.), shape analysis requires carefully chosen in- 

^ _• strumentation predicates to be given by the user, and often are particular to 

the program that is being verified. Shape analysis techniques typically merge 
all nodes that satisfy the same unary predicate, achieving finiteness of the ab- 
stract domain, and interpret the other predicates using a 3-valued (must, must 
not, may) abstraction. Moreover, these instrumentation predicates often require 
to be encoded in particular ways (for example, capturing binary predicates as 
particular kinds of unary predicates) so as to not lose precision. 

For instance, consider a sorting algorithm that has an invariant of the form: 
Vx, y. ( {x ^;,^t yhy -^l^^^ i) => d{x) < d{y) ) 
which says that the sub-list before pointer i is sorted. In order to achieve a 
shape- analysis algorithm that discovers this invariant (i.e., captures this invari- 
ant precisely during the analysis), we typically need instrumentation predicates 



such as p{z) ^ z -^l^^t i, s{x) = Vy.((x ->;^^j y Ay ^^^^^ i) ^ d{x) < d{y)), 
etc. The predicate s{x) says that the element that is at x is less than or equal to 
the data stored in every cell between x and i. These instrumentation predicates 
are clearly too dependent on the precise program and property being verified. 

In this paper, we investigate an abstract domain for heaps that works without 
user- defined instrumentation predicates (except we require that the user fix an 
abstract domain for data, like octagons, for comparing data elements). 

We propose a radically new approach to heap analysis through an abstract 
domain of automata, called automatic shapes (automatic because we use au- 
tomata) . The abstract domain uses a particular kind of automata, called quan- 
tified data automata, that define, logically, universally quantified properties of 
heap structures. In this paper, we restrict our attention to heap structures that 
have only one pointer field; our analysis is hence one that can be used to analyze 
properties of heaps containing lists, with possible aliasing (merging) of them, es- 
pecially at intermediate stages in the program. One-pointer heaps can be viewed 
as skinny trees (trees where the number of branching nodes is bounded). 

Automata, in general, are classical ways to capture an infinite set of objects 
using finite means. A class of (regular) skinny trees can hence be represented 
using tree automata, capturing the structure of the heap. While similar ideas 
have been explored before in the literature [17^ , our main aim is to also represent 
properties of the data stored in the heap, building automata that can express 
universally quantified properties on lists, in particular those of the form 

/\,- Va;. {Guardi{p,x) => Datai{d{p),d{x))) 

where p is the set of static pointer variables in the program. The Guarck for- 
mulas express structural constraints on the quantified variables and the pointer 
variables, while the Datai formulas express properties about the data stored at 
the nodes pointed to by these pointers. In this paper, we investigate an abstract 
domain that can infer such quantified properties, parameterized by an abstract 
numerical domain J-d for the data formulas and by the number of quantified 
variables x. 

The salient aspect of the automatic shapes that we build is that (a) there 
is no requirement from the user to define instrumentation predicates for the 
structural Guard formulas; (b) since the abstraction will not be done by merging 
unary predicates and since the automata can define how data stored at multiple 
locations on the heap are related, there is no need for the user to define carefully 
crafted unary predicates that relate structure and data (e.g., like the unary 
predicate s{x) defined above that says that the location x is sorted with respect 
to all successive locations that come after x but before i). Despite this lack of 
help from the user, we show how our abstract domain can infer properties of 
a large number of list-manipulating programs adequately to prove interesting 
quantified properties. 

The crux of our approach is to use a class of automata, called quantified 
data automata on skinny trees (QSDA), to express a class of single-pointer 



heap structures and the data contained in them. QSDAs read skinny trees with 
data along with all possible valuations of the quantified variables, and for each 
of them check whether the data stored in these locations (and the locations 
pointed to by pointer variables in the program) relate in particular ways defined 
by the abstract data-domain J^d- We show that the class of QSDAs (over a 
data-domain T^ and a set of variable x) forms an abstract data domain lattice. 
Along with the natural concretization and abstraction relations, this class forms 
a Galois connection with respect to the class of concrete single-pointer heap data 
structures. 

We further show, for a simple heap-manipulating programming language, 
that we can define an abstract post operator over the abstract domain of QSDAs. 
This abstract post preserves the structural aspects of the heap precisely (as 
QSDAs can have an arbitrary number of states to capture the evolution of 
the program) and that it soundly abstracts the quantified data properties. The 
abstract post is nontrivial to define and show it effective as it requires automata- 
theoretic operations that need to simultaneously preserve structure as well as 
data properties; this forms the hardest technical aspect of our paper. We thus 
obtain an effective abstract interpretation using the domain of QSDAs. 

Traditionally, in order to handle loops and reach termination, abstract do- 
mains require some form of widening. Our notion of widening is directed by 
decidability considerations. Assume that the programmer computes a QSDA as 
an invariant for the program at a particular point, where there is an assertion 
expressed as a quantified property p over lists (such as "the list pointed to by 
head is sorted"). In order to verify that the abstraction proves the assertion, we 
will have to check if the language of lists accepted by the QSDA is contained 
in the language of lists that satisfy the property p. However, this is in general 
undecidable. Our aim is to overapproximate the QSDA into a larger language 
accepted by a particular kind of data automata, called elastic QSDA (EQSDA) 
for which this inclusion problem is decidable (for an appropriately chosen lan- 
guage for expressing assertions). 

This elastification will in fact serve as the basis for widening as well, as there 
are only a finite number of elastic QSDAs that express structural properties, 
discounting the data-formulas. Consequently, we can combine the elastification 
procedure (which overapproximates a QSDA into an elastic QSDA) and widen- 
ing over the numerical domain for the data in order to obtain widening pro- 
cedures that can be used to accelerate the computation for loops. In fact, the 
domain of EQSDAs can be seen as an abstract domain, and there is a natu- 
ral abstract interpretation between QSDAs and EQSDAs, where the EQSDAs 
permit widening procedures. We show a unique elastification theorem that shows 
that for any QSDA, there is a unique elastic QSDA that over-approximates it. 
This elastification is in fact the abstract map a that connects QSDAs with 
EQSDAs (the 7 map being identity, as EQSDAs are also QSDAs). 

We also show that EQSDA properties over lists can be translated to a decid- 
able fragment of the logic Strand [21] over lists, and hence inclusion checking 
an elastic QSDA with respect to any assertion that is also written using the 



decidable sublogic of Strand over lists is decidable. The notion of QSDAs and 
elasticity are extensions of recent work in [12^, where such notions were devel- 
oped for words (as opposed to trees) and where the automata were used for 
learning invariants from examples and counter-examples. 

We implement our abstract domain and transformers and show, using a 
suite of list-manipulating programs, that our abstract interpretation is able 
to prove the naturally required (universally-quantified) properties of these pro- 
grams. While several earlier approaches (such as shape analysis) can tackle the 
correctness of these programs as well, our abstract analysis is able to do this 
without requiring program-specific help from the user (for example, in terms of 
instrumentation predicates in shape analysis, and in terms of guard patterns in 
the work by Bouajjani et al [5]). 

Related Work. Shape analysis [53] is the one of the most well-known tech- 
nique for synthesizing invariants about dynamically evolving heaps. However, 
shape analysis requires user-provided instrumentation predicates which are of- 
ten too particular to the program being verified. Hence coming up with these 
instrumentation predicates is not an easy task. In recent work [6l[71[T5l[24] , sev- 
eral abstract domains have been explored which combine the shape and the data 
constraints. Though some of these domains [TlIS^ can handle heap structures 
more complex than singly-linked lists, all these domains require the user to pro- 
vide a set of data predicates [T5] or a set of structural guard patterns [5] or 
predicates over both the structure and the data constraints [7l|24]. In contrast, 
the only assistance our technique requires from the user is specifying the number 
of universally quantified variables. 

For singly-linked lists, [33] introduces a family of abstractions based on a set 
of instrumentation predicates which track uninterrupted list segments. However 
these abstractions only handle structural properties and not the more-complex 
quantified data properties. Several separation logic based shape analysis tech- 
niques have also been developed over the years [HISl fTTllTC] . But they too mostly 
handle only the shape properties (structure) of the heap. 

Our automaton model for representing quantified invariants over lists is in- 
spired by the decidable fragment of Strand [3T] and can track invariants with 
guard constraints of the form y < t ot t < y for a universal variable y and some 
term t. These structural constraints on the guard are very similar to array par- 
titions in [5i rni[T5] . However, our automata model is more general. For instance, 
none of these related works can handle sortedness of arrays which requires quan- 
tification over more than one variable. 

Techniques based on Craig's interpolation have recently emerged as an or- 
throgonal way for synthesizing quantified invariants over arrays and lists [TJ 
[2D1[351[3D] . These methods use different heuristics like term abstraction [3^ or 
introduction of existential ghost variables [T| or finding interpolants over a re- 
stricted language [IHIISS] to ensure the convergence of the interpolant from a 
small number of spurious counter-examples. The shape analysis proposed in [28] 
is also counter-example driven. j28j requires certain quantified predicates to be 
provided by the user. Given these predicates, it uses a CEGAR-loop for incre- 



mentally improving the precision of the abstract transformer and also discovering 
new predicates on the heap objects that are part of the invariant. 

Automata based abstract interpretation has been explored in the past |17) 
for inferring shape properties about the heap. However, in this paper we are 
interested in strictly-richer universally quantified properties on the data stored in 
the heap. [^ introduces a streaming transducer model for algorithmic verification 
of single-pass list-processing programs. However the transducer model severely 
constrains the class of programs it can handle; for example, [2] disallows repeated 
or nested list traversals which are required in sorting routines, etc. 

In this paper we introduce a class of automata called quantified skinny- 
tree data automata (QSDA) to capture universally quantified properties over 
skinny-trees. The QSDA model is an extension of recent work in |12j where a 
similar automata model was introduced for words (as opposed to trees). Also, 
the automata model in |12j was parameterized by a finite set of data formulas 
and was used for learning invariants from examples and counter-examples. In 
contrast, we extend the automata in |12j to be instantiated with a (possibly- 
infinite) abstract domain over data formulas and develop a theory of abstract 
interpretation over QSDAs. 



2 Programs Manipulating Heap and Data 

We consider sequential programs manipulating acyclic singly-linked data struc- 
tures. A heap structure is composed of locations (also called nodes). Each loca- 
tion is endowed with a pointer field next that points to another location or it 
is undefined, and a data field called data that takes values from a potentially 
infinite domain D (i.e. the set of integers). For simplicity we assume a special 
location, called dirty, that models an un-allocated memory space. We assume 
that the next pointer field of dirty is undefined. Besides the heap structure, a 
program also has a finite number of pointer variables each pointing to a location 
in the heap structure, and a finite number of data variables over D. In our pro- 
gramming language we do not have procedure calls, and we handle non-recursive 
procedures calls by inlining the code at call points. In the rest of the section we 
formally define the syntax and semantics of these programs. 

Syntax. The syntax of pro- {prgm) -.■.= pointer pi,. .. ,pk; data di,.. .,di; {pc_strn.t) + 

grams is defined by the BNF ip^-'^tmt) -.■.= pc -. {stmt); 

rrn- m A (stmt) ::— (ctrl stmt) I (heap stmt) 

grammar of i igurenj A pro- ' , - ;, > , 

_ (ctrl stmt) ::— di : — (aata expr) \ skip | assunie((prea)) 

gram starts with the dec- | ^f (pred) then (pc_stmt)+ else {pc_stmt)+ fi 

laration of pointer variables t while (pred) do (pc_stmt)+ od 
among which one called {heap_stmt) ■■= new p, | pi := nii | p. ■=pj 

\ Pi ■■= Pj ^ next I Pi -^ next := nil | pi -+ next := pj 

ml, toliowed by a declara- | p, _> data := (data_expr) 

tion of data variables. Data i?- i c- i • i 

-tig. 1: bimple programming language. 

variables range over a po- 
tentially infinite data domain D. We assume a language of data expressions 
built from data variables and terms of the form pi -^ data (with pi ^ nil) 



using operations over B. Predicates in our language are either data predicates 
built from predicates over D or structural predicates concerning the heap built 
from atoms of the form pi —— pj, pi — ^ next =— pj, and pi — ?>* next =— pj, 
for some i,j £ [1, k]. Thereafter, there is a non-empty list of labelled statements 
of the form pc : {stmt) where pc is the program counter and {stmt) defines a 
language of either C-like statements or statements which modify the heap. We 
do not have an explicit statement to free locations of the heap: when a location 
is no longer reachable from any location pointed by a pointer variable we assume 
that it automatically disappears from the memory. For a program P, we denote 
with PC the set of all program counters of P statements. Figure [2][a) shows the 
code for program sorted list-insert which is a running example in the paper. The 
program inserts a key into the sorted list pointed to by variable head. 

Semantics. A configuration C of a program P with set of pointer variables PV 
and data variables DV is a tuple {pc, H,pval, dual) where 

— pc £ PC is the program counter of the next statement to be executed; 

— H is a heap configuration represented by a tuple (Loc, next, data) where 
(1) Loc is a finite set of heap locations containing a special element called 
dirty, (2) next : Loc i— >■ Loc is a partial map defining an edge relation among 
locations such that the graph (Loc, next) is acyclic, and (3) data : Loc i— > D 
is a map that associates each location of Loc with a data value in D; 

— pval : PV I— 7- Loc associates each pointer variable of P with a location in H. 
If pval{p) = V we say that node v is pointed by variable p. Furthermore, each 
node in Loc is reachable from a node pointed by a variable in PV . There is 
no outgoing (next) edge from location dirty and there is a next edge from 
the location pointed by nil to dirty; 

— dual : DF I— > D is a valuation map for the data variables. 

Figure [IJb) graphically shows a progam configuration which is reachable at 
program counter 8 of the program in Figure [D^a) (as explained later we encode 
the data variable key as a pointer variable in the heap configuration). The tran- 
sition relation of a program P, denoted >p for each statement stmt of P, 

is defined as usual. The control-flow statements update the program counter, 
possibly depending on a predicate (condition). The assignment statements up- 
date the variable valuation or the heap structure other than moving to the next 
program counter. A formal semantics of programs can be found in Appendix Rl 
Let us define the concrete transformer F^ = XC.{C' \ C " ™ > p C'}. The concrete 
semantics of a program is given as the least fixed point of a set of equations of 
the form ^ = ^^(■0). 

To simplify the presentation of the paper, we assume that our programs do 
not have data variables. This restriction, indeed, does not reduce their expres- 
siveness: we can always transform a program P into an equivalent program P' 
by translating each data variable d into a pointer variable that will now point 
to a fresh node in the heap structure, in which the value d is now encoded by 
d — > data. The node pointed by d is not pointed by any other pointer, further. 



d — > next points to dirty. Obviously, wherever d is used in P will now be replaced 
by d ^ data in P' . 

3 Quantified Skinny- Tree Data Automata 

In this section we define quantified skinny-tree data automata (QSDAs, for 
short), an accepting mechanism of program configurations (represented as spe- 
cial labelled trees) on which we can express properties of the form 
/\^ Vyi, . . . , 2/f . Guardi => Datai, where variables yi range over the set of locations 
of the heap, Guardi represent quantifier-free structural constraints among the 
pointer variables and the universally quantified variables j/i, and Datai (called 
data formulas) are quantifier-free formulas that refer to the data stored at the 
locations pointed either by the universal variables yi or the pointer variables, 
and compare them using operators over the data domain. In the rest of this 
section, we first define heap skinny-trees which are a suitable labelled tree en- 
codings for program configurations; we then define valuation trees which are heap 
skinny-trees by adding to the labels an instantiation of the universal variables. 
Quantified skinny-tree data automata is a mechanism designed to recognize valu- 
ation trees. The language of a QSDA is the set of all heap skinny-trees such that 
all valuation trees deriving from them are accepted by the QSDA. Intuitively, 
the heap skinny-trees in the language defined by the QSDA are all the program 
configurations that verify the formula /\j Vyi, . . . , ye. Guardi => Datoi. 

Let T be a tree. A node w of T is branching whenever u has more than 
one child. For a given natural number fc, T is k-skinny if it contains at most k 
branching nodes. 

Heap skinny-trees. Let PV he the set of pointer variables of a program P 
and S — 2^^ (let us denote the empty set with a blank symbol b). We associate 
with each P configuration C = {pc,H,pval, dual) with H — (Loc, next, data), 
the {S X D)-labelled graph % = (T, A) whose nodes are those of Loc, and where 
(u, v) is an edge of T iff next(i;) — u (essentially we reverse all next edges). From 
the definition of program configurations it is easy to see that T is a fc-skinny tree 
where k = \PV\. The labelling function A : Loc i-^ (Z" x D) is defined as follows: 
for every u G Loc, \{u) = [S, d) where S is the set of all pointer variables p such 
that pvalip) ~ M, and d — data(?i). We call "H the heap skinny-tree of C. 
In general heap skinny-trees can be logically characterized as follows. 

Definition 1 (Heap Skinny-Trees). A heap skinny-tree over a set of pointer 
variables PV (with nilG PV ) and data domain D, is a {S xJ}) -labelled k-skinny 
tree (T, A) with S = 2^^ and k = \PV\, such that: 

— for every leaf v of T , \{v) = {S,d) where S ^ 0; 

— for every pointer variable p G PV , there is a unique node v of T such that 
X{v) = {S, d) with p G S; 

— the node v of T such that \{v) — {S,d) and nilG S is one of the childen of 
the root of T . D 



pointer head, cur, prev. tmp; 
data key; 

1: cur :— head; 

2: while {cur\ — nilA 

cur — > data < key) do 
3: prev :— cur; 

4: cur :— cur —^ next; 

od 
5: new tmp; 
6: t7np —J- data :— key; 
7: tmp — >■ next :— cur; 

if {prev ! — nil) then 
prev — )■ next :— tmp; 

else 
10: head :— tmp; 

fi 



head 



(b) 



preu cur 



imp -"^^ key-»{8) 



nib, $) 

:{nil},$)/ ^({fce^} 
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Fig. 2: (a) sorted list-insert program P; (b) shows a P configuration at program 
counter 8; (c) is the heap skinny-tree associated to (b); (d) is a valuation tree 
of (c). 



Figure [He) shows the heap skinny-tree corresponding to the program con- 
figuration of Figure [2][b) . Note that though the program handles a singly linked 
list, in the intermediate operations we can get trees. However they are special 
trees with bounded branching. This example illustrates that program configura- 
tions of list manipulating programs naturally correspond to heap skinny-trees. 
It also motivates why we need to extend automata over words introduced in |12) 
to quantified data automata over skinny-trees. We now define valuation trees. 

Valuation trees. Let us fix a finite set of universal variables Y . A valuation tree 
over y of a heap skinny-tree H is a (Z" x (y U { — }) x ]D))-labelled tree obtained 
from T-L by adding an element from the set F U {— } to the label, in which every 
element in Y occurs exactly once in the tree. We use the symbol ' — ' at a node v 
if there is no variable from Y labelling v. A valuation tree corresponding to the 
heap skinny-tree of Figure [IJc) is shown in Figure [2Jd). 

Definition 2 (Quantified Skinny-Tree Data Automata). A quantified 
skinny-tree data automaton (^QSDAj over a set of pointer variables PV (with 
\PV\ = k), a data domain D, a set of universal variables Y , and a set of data 
formulas F over D, is a tuple A — {Q, i7. A, T, /) where: 

— Q is a finite set of states; 

— n = S xY is the alphabet where S = 2^^ andY ^Y\J {-}; 

— Z\ = (Z\o, Z\i, . . . , Ak) where, for every i £ [1, k], Ai : (Q* x 77) M- Q defines 
a (deterministic) transition relation; 

— T : Q —>■ 2^^^^ is the type associated with every state q E Q; 

— f : Q i-^ F is a final- evaluation. D 



A valuation tree (T, A) over y of a program P, where N is the set of nodes 
of T, is recognized by a QSDA A if there exists a node-labelhng map p : N ^^ Q 
that associates each node of T with a state in Q such that for each node t of 
T with A(i) = {S,y,d) the fohowing holds (here A'(i) = {S,y) is obtained by 
projecting out the data values from A(i)): 

— if t is a leaf then Z\o(A'(i)) = /5(i) and (r(p(t)) n PV) ^ 0. 

— if i is an internal node, with sequence of children ti, ^2, • ■ • , ij then 

. A((p(ii),...,p(i,)), A'(i))=p(t); 

. S n r(p(ij)) = and 2/ ^ r(p(i,)), for every j e [1, i]; 

. r(p(i)) ^ SU{y}U (U,e[i,,] r(p(t,))) if y e r. Otherwise iiy = - 

thenr(p(t)) = 5ufu,^[,,]r(pfe))). 

— if t is the root then T{p{t)) ~ [PV U Y) and the formula f{p{t)), obtained 
by replacing all occurrences of terms y -^ data and p — > data with their 
corresponding data values in the valuation tree, holds true. 

A QSDA can be thought as a register automaton that reads a valuation 
tree in a bottom-up fashion and stores the data at the positions evaluated for 
Y and locations pointed by elements in PV, and checks whether the formula 
associated to the state at the root holds true by instantiating the data values 
in the formula with those stored in the registers. Furthermore, the role of map 
T is that of enforcing that each element in PV L)Y occurs exactly once in the 
valuation tree. 

A QSDA A accepts a heap skinny-tree Ti ii A recognizes all valuation trees 
of Ti.. The language accepted by A, denoted L{A), is the set of all heap skinny- 
trees Ti accepted by A. A language C of heap skinny-trees is regular if there is 
a QSDA A such that C = L{A). Similarly, a language C of valuation trees is 
regular if there is a QSDA A such that C — Ly{A), where Ly{A) is the set of 
all valuation trees recognized by A. 

QSDAs are a generalization of quantified data automata introduced in |12) 
that handle only lists as opposed to QSDAs that handle skinny-trees. We now 
introduce various characterizations of QSDAs which are used later in the paper. 

Unique minimal QSDA. In 12, the authors show that it is not possible to 
have a unique minimal quantified data automaton over words (with respect to the 
number of states) which accepts a given language over linear heap configurations. 
The proof gives a set of heap configurations over a linear heap-structure which is 
accepted by two different automata having the same number of states. Since QS- 
DAs are a generalization of quantified data automata, the same counter-example 
language holds for QSDAs. However, under the assumption that all data for- 
mulas in F are pairwise non-equivalent, there does exist a canonical automaton 
on the level of valuation trees. In [12j . the authors prove the canonicity of quan- 
tified data automata, and their result extends to QSDAs in a straight forward 
manner. 

Theorem 1. For each QSDA A there is a unique minimal QSDA A' such that 
LM)=LviA'). 



We give some intuition behind the proof of Theorem [TJ First, we introduce 
a central concept called symbolic trees. A symbolic tree is a (17 x (y U { — }))- 
labelled tree that records the positions of the universal variables and the pointer 
variables, but does not contain concrete data values (hence the word symbolic). 
A valuation tree can be viewed as a symbolic tree augmented with data values at 
every node in the tree. There exists a unique tree automaton over the alphabet 7T 
that accepts a given regular language over symbolic trees. It can be shown that 
if the set of formulas in F are pair-wise non-equivalent, then each state q in the 
tree automaton, at the root, can be decorated with a unique data formula f{q) 
which extends the symbolic trees with data values such that the corresponding 
valuation trees are in the given language. 

Hence, a language of valuation trees can be viewed as a function that maps 
each symbolic tree to a uniquely determined formula, and a QSDA can be 
viewed as a Moore machine (an automaton with output function on states) that 
computes this function. This helps us separate the structure of valuation trees 
(the height of the trees, the cells the pointer variables point to) from the data 
contained in the nodes of the trees. We formalize this notion by introducing 
formula trees. 

Formula trees. A formula tree over pointer variables PV, universal variables 
Y and a set of data formulas F is a tuple oi a S x [Y U {— })-labelled tree 
(or in other words a symbolic tree) and a data formula in F such that if we 
extend the tree with data values which satisfy the formula, we get a valuation 
tree. For a QSDA which captures a universally quantified property of the form 
/\j Vyi . . .yi.Guardi =4> Datoi, the symbolic tree component of the formula tree 
corresponds to guard formulas like Guardi which express structural constraints 
on the pointers pointing into the valuation tree. The data formula in the formula 
trees correspond to Datoi which express the data values with which a symbolic 
tree (read Guardi ) can be extended so as to get a valuation tree accepted by the 
QSDA. In our running example, consider a QSDA with a formula tree which has 
the same symbolic tree as the valuation tree in FigureHKd) (but without the data 
values in the nodes) and a data-formula </? = j/i — >■ data < j/2 ^^ data A yi — > 
data < key A j/2 —^ data > key. This formula tree represents all valuation trees 
(including the one shown in Figure [2][d)) which extend the symbolic tree with 
data values which satisfy Lp. 

By introducing formula trees we explicitly take the view of a QSDA as 
an automaton that reads symbolic trees and outputs data formulas. We say a 
formula tree (i, (p) is accepted by a QSDA ^ if ^ reaches the state q after reading 
t and f{q) — ip. Given a QSDA A, the language of valuation trees accepted by A 
gives an equivalent language of formula trees accepted by A and vice-versa. We 
denote the set of formula trees accepted by A as Lf{A). A language over formula 
trees is called regular if there exists a QSDA accepting the same language. 

Theorem 2. For each QSDA A there is a unique minimal QSDA A' that 
accepts the same set of formula trees. 
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4 QSDAs as an Abstract Domain 

In the previous section we introduced quantified skinny-tree data automata as 
an automaton model for expressing universally quantified properties over heap 
skinny-trees. In this section, we first show that QSDAs form a lattice and then 
formalize the correspondence, by establishing an abstraction function and a con- 
cretization function, between a set of heap skinny-trees and QSDAs. 

Given a set of pointer variables PV and universal variables Y, let Qf be 
the set of all QSDA over a set of data formulas F. Clearly Qp is a partially- 
ordered set where the most natural partial order is the set-inclusion over the 
language of QSDAs. However checking whether a pair of QSDAs are ordered 
with respect to this partial order is undecidable. Since QSDAs generalize the 
quantified data automata over words |12) . the undecidability follows from the 
fact that quantified data automata (as well as QSDAs) can express quantified 
invariants such that checking the validity of such invariants is undecidable. 

So, we consider a new partial-order on QSDAs which is decidable, allows us to 
define a unique least upper bound for every pair of QSDAs and finally show that 
QSDAs form a lattice. To accomplish this, let us first assume that the set of for- 
mulas F parameterizing QSDAs form a lattice T = (F, Qj^, Ujr^ \ljr, false, true) 
where Cjr is the partial-order on the data-formulas, Ujf and \ljr are the least 
upper bound and the greatest lower bound and false and true are formulas re- 
quired to be in F and correspond to the bottom and the top elements of the 
lattice, respectively. Also, we assume that whenever a Cjr /3 then a => /3. Fur- 
thermore, we assume that any pair of formulas in F are non-equivalent. For a 
logical domain as ours, this can be achieved by having a canonical representative 
for every set of equivalent formulas. 

Now if we view a QSDA as a mapping from symbolic trees to formulas in 
J-", we can define a new partial-order relation on QSDAs as follows. We say 
Ai E -42 if Lf{Ai) C Lf{A2), which means that for every symbolic tree t if 
{t,(pi) e Lf{Ai) and {t,ip2) G Lf{A2) then ipi \—jr (^3. Note that, whenever 
Ai C A2 implies that L{Ai) C L{A2)- Also, with respect to this new partial 
order, we can show that QSDAs form a complete lattice (Qjr, C, U, n, ±, T) 
where the join of the two automata Ai and A2 maps the symbolic tree t to the 
unique formula ipi Ujf ip2- Similarly, the meet of the automata Ai and A2 maps 
the tree t to cpi r\jr(p2. The bottom element in the lattice Qjr is the QSDA which 
maps every symbolic tree to false and the top element is the QSDA which maps 
every symbolic tree to the formula true. 

We now define an abstraction function a : "H — > Qj^ and a concretization 
function 7 : Qjr — > H such that ('H,a,7,Qjr) form a Galois-connection. Note 
that, abstract interpretation [8] requires that the abstraction function a maps 
a concrete element (a language of heap skinny-trees) to a unique element in the 
abstract domain and that a be surjective; similarly 7 should be an injective 
function. Also note that given a regular language of heap skinny-trees there 
might be several QSDAs accepting that language. In such a case defining a 
surjective function a is not possible. Therefore, we first restrict ourselves to a 
set of QSDAs in Qjr where each QSDA accepts a different language. Under this 
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assumption, we define an a and a 7 as follows: for a set of heap configurations 
n, a{K) ^ [}{A I H C L(A)} and -i{A) = {H \ H e L{A)}. Note that both a 
and 7 are order-preserving; a is surjective and 7 is an injective function. Also for 
a set of heap configurations H, H ^ j{a{'H)) and for a QSDA A, A — a{j{A)). 
Hence {H,a,j, Qjr) form a Galois-connection. 

Theorem 3. Let ("HjC) be the set of all heap skinny-trees and (Qj^,^) be the 
set o/QSDA,s (accepting pairwise inequivalent languages) over data formulas T, 
then (?^,q:,7, Qjt) form a Galois-connection. 



5 Abstract Semantics over QSDAs 

In the previous section we established a Galois-connection between a set of 
heap skinny-trees and QSDAs. Here, we describe an abstract transformer over 
QSDAs which soundly over-approximates the concrete semantics of the pro- 
gramming language. This provides a way to compute the semantics of a program 
over an abstract domain consisting of QSDAs. 

We first show that it is not possible to capture the most-precise concrete 
transformer on QSDAs. A QSDA expresses universally quantified properties 
over heap trees, of the form Vyi . . .yi.ip where f/' is a quantifier- free formula 
over the pointer variables PV, the universal variables Y and the data value 
at the locations pointed to by these variables. Given a QSDA A, the concrete 
transformer F'^ guesses a pre-state accepted by A (which involves existential 
quantification), and then constrains the post-state with respect to this guessed 
pre-state according to the semantics of the statement. For instance, consider the 
statement Pi '■= Pj- Given a QSDA accepting a universally quantified property 
Vyi ■ • . yi-'>P, its strongest post-condition with respect to this statement is the 
formula: 3p^.Vj/i . . .yt.ip[pi/p'j\ A pi = pj. Note that, an interpretation of the 
existentially quantified variable p[ in a model of this formula gives the location 
node pointed to by variable pi in the pre-state, such that the formula Vyi . . .ye-ip 
was satisfied by the pre-state. However it is not possible to express these precise 
post-conditions, which are usually of the form 3*\/*ip, in our automaton model. 
So we abstract these precise post-conditions by a QSDA which semantically 
moves the existential quantifiers inside the universally quantified prefix, where 
they can be eliminated. In the above example, the abstract post-condition QSDA 
guesses a position for the pointer variable pi for every valuation of the universal 
variables, such that the valuation tree augmented with this guessed position is 
accepted by the precondition QSDA. More generally, the abstract transformer 
computes the most precise post-condition over the language of valuation trees 
accepted by a QSDA, instead of computing the precise post-condition over the 
language of heap skinny-trees. In fact, we go beyond valuation trees to formula 
trees; the abstract transformer evolves the language of formula trees accepted 
by a QSDA by tracking the precise set of symbolic trees to be accepted in the 
post-QSDA and their corresponding data formulas. 
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Statements 



Abstract Transformer FJ on a regular language over formula trees 



il,t')}\ 



XLf. [(t',if') I ip' : [J{3ij.yj[pi — )■ data/d] | {t,(p) € L f , update{t , pi 



Pi ■- Pj 



XLf. {(t' ,if') I ifi' : (pi -^ data = pj —> data) n 

|J{3(i.(/:[pi -^ data/c/] | (t,ip) G Lf,update{t^pi :— Pji^')}} 



XLf. \{t' .if') I y?' : |J{3(i.c^[pi — )■ data/d] | (t, (p) E Lf, update(t,pi '.— Pj -^ next, t')} 
n {pi — )■ data — i; — )■ data), ii G /a6e/(/:',pi)} 



Pi ■- Pj 



XLf. {{t'^if') I ip' : |J{^ I {t,if) G Lf,update{t,pi 



il,t')}} 



■ nil 



XLf. {{t'.ip') I y?' 



-Pj,n}\ 



■ next :— pj 



\_\{<f I {i, V^) G Lf,update{t,pi 



XLf. \^{t' ,if') I 99' : ^d.ip[vi -^ data/d, . . . ,v^ — > data/d] 
n P{i) — > data — dataexpr \ v G V^}, 

y- {m,...,!'^} = lahel{t',pi),{t',p)) eLf} 



Pi -^ data : — 

data_expr 



XLf. {{t',^') I (f^y^O G L/, t^ H instruct} 



assume ipstmct 



XLf. {(t',ip') I y' : ynV;rf„ta,(f',y) g J^/j 



assume Ipdata 



XLf.{(t',^') \ if' : (y ^ data ^ p^ -> data)n 

|J{3rfi(i2-V^[pi -^ data/rfi,y — >■ data/d2] | {'t,<p) G L f , update[t , new'^ pi,t')}, 

y€Yij{-}} 



new Pi 



Table 1: Abstract Transformer FJ. The abstract transformer F' = \A.A! where A! is 
the unique minimal QSDA such that Lf{A!) = {fI) Lf{A). The predicate update and 
the set label are defined below. 



Table [I|| gives the abstract transformer fI which takes a regular language 
over formula trees Lf and gives, as output, a set of formula trees. We know 
from Theorem [5] that for any regular set of formula trees there exists a unique 
minimal QSDA that accepts it. We show below (see Lemma[T]) that for a QSDA 
A, the language over formula trees given by (F?) Lf[A) is regular. Hence, we 
can define the abstract transformer F' as F" = XA.A! where A' is the unique 
minimal QSDA such that L}{A') = (f]) Lf{A). 

In TablelU label(t,pi) is the set of pointer and universal variables which label 
the same node in t as variable pi. The predicate update{t, stmt, t') is true if sym- 
bolic trees t and t' are related such that the execution of statement stmt updates 
precisely the symbolic tree t to t' . As an example, the abstract transformer for 
the statement pi := nil in the first row of Table [1] states that the post-QSDA 
maps the symbolic tree t' to the data-formula ip' where Lp' is the join of all for- 
mulas of the form 3d.Lp[pi — > data/d] where Lp is the data-formula associated 
with symbolic tree t in the pre-QSDA such that update{t,pi := nil,t') is true. 

We now briefly describe the predicate update{t,newy Pi,t') which is used 
in the definition of the transformer for the new statement and is slightly more 
involved. The statement new pi allocates a new memory location. After the ex- 
ecution of this statement, pointer pi points to this allocated node. Besides, the 
universal variables also need to valuate over this new node apart from the valu- 
ations over the previously exisiting locations in the heap. The superscript y in 
the predicate update{t,new''^ Pi,t') tracks the case when variable y G F U { — } 
valuates over the newly allocated node. Hence, if update(t, new^ pi, t') holds true 
then the symbolic trees t and t' agree on the locations pointed to by all variables 



^ The abstract transformer defined in Table[T]assumes that there are no memory errors 
in the program. It can be extended to handle memory errors. 
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except Pi and the universal variable y; both these variables point, in t', to a new 
location v which is not in t and a new edge exists in t' from the root to v. 

From the construction in Table [T] it can be observed that given a language of 
valuation trees obtained uniquely from a language of formula trees, FI applies 
the most-precise concrete transformer on each valuation tree in the language, 
and then constructs the smallest regular language of valuation trees (or equiva- 
lently formula trees) which approximates this set. More precisely, for all formula 
trees (i, v?) G Lf{A)^ the abstract transformer fI applies the precise concrete 
transformer on the symbolic tree t (only the structure with the valuations for 
universal variables) to obtain t' . And separately, it applies the precise concrete 
transformer on the data extensions of t, which is given by iy9, to obtain the data 
formula Lp' such that {t',ip') G {Ff)Lf{A). However, note that reasoning over 
valuation/formula trees (and not heap skinny-trees) comes with a loss in preci- 
sion. To regain some of this lost precision, we define a function Strengthen which 
takes a formula language L/ and finds a smaller language over formula trees, 
which accepts the same set of heap trees. Here t [y stands for a 7T\{y} -labelled 
tree which agrees with t on the locations pointed to by all variables except y. 

Strengthen = Xy. XL f.{{t' , if') \ ip' :Lp"n4>, {t',Lp")^Lf, 

4> : \~\{Bd.ip[y -^ data/d] | (t, ip) £ Lf, t [y= t' [y}} 

We now reason about the soundness of the operator Strengthen. Fix a y d Y. 
Consider a QSDA A with a language over formula trees Lf and consider all sym- 
bolic trees t such that t [y=t' [y. This implies that the trees t have the pointer 
variables pointing to the same positions as t' and have the same valuations for 
variables in Y\{y}. Since our automaton model has a universal semantics, any 
heap tree accepted by A should satisfy the data formulas annotated at the final 
states reached for every valuation of the universal variables. If we look at a fixed 
valuation for variables in K\{j/} (which is same as that in t') and different valua- 
tions for y, any heap tree accepted should satisfy the formula 3d.ip[y — > data/d] 
for all such (i, tp) & Lf. Hence the Strengthen operator can safely strengthen the 
formula ip" associated with the symbolic tree t' to p" n (f). Appendix [C] shows 
that for a given universal variable y and a regular language Lf, the language 
over formula trees {Strengthen) y Lf is regular. The proof in fact constructs 
the QSDA accepting the language (Strengthen) y Lf{A) for a QSDA A. The 
abstract transformer Fr can be thus soundly strengthened by an application of 
Strengthen at each step, for each variable y G Y. 

We now prove that the language over formula trees given by {FhLf{A) is 
a regular language for any QSDA A. This helps us to construct the abstract 
transformer F" : Qjr — ^ Qjr. And finally, we show that this abstract transformer 
is a sound approximation of the concrete transformer F^ . 

Lemma 1. For a QSDA A, the language (FJ) Lf{A) over formula trees is 
regular. 

Proof. We prove via construction. Given a QSDA A, we construct a QSDA A' 
such that LfiA') = {FJ) Lf{A). 
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Here we only give the construction of A' for the statement pi :— nil (for other 
statements, see ApDendixlDJ). The QSDA A' simulates A on all the nodes, except 
a node v labeled with pointer pi and the node labeled nil. A tree accepted by 
A' does not read pi at node v; on the other hand pi is read at the nil node. 
Let A = {Q,n,A,T,f), then A' is of the form {Q,n, A',T' J'). For every 
transition Zi(gi, . . . , (jj, tt) = q such that Pi and nil are not present in the label 
TT, A'{qi, . . . .,qj,'!T) = q. However if pi is present in n and nil is not, then the 
corresponding transition in A' is A'{qi, . . . ,qj,TT') — q where tt' is same as tt 
except it does not have pi. On the other hand, if nil is present in n and pi is 
not, then the transition A'{qi, . . . ,qj,7r') = q where tt' is same as tt except for 
the presence of pi . The new type T' can be easily computed for every state in 
the automaton. The evaluation function /' existentially quantifies out the data 
value of pointer pi i.e. for all states q E Q, f'{q) ~ 3d.f{q)[pi -> data/d]. The 
transition relation A' thus constructed might need to be determinized to obtain 
A' . For a symbolic tree t and formulas ipi, . . . ,ipj such that (t,(pi), . . . , (t, (pj) 
belong to the language, the determinization procedure maps t to the formula 
ipiU . . .\J ipj. It can be easily shown that the language of A' is (F") Lf{A). 

Appendix |D] shows the construction of the automaton A' for other state- 
ments in our language. In this way, via construction, we prove that the language 
{F'^) Lf{A) over formula trees is regular. D 

From Lemma [1] and Theorem [2] it follows that there exists a QSDA A' such 
that A' — {F'^)A. In fact the proof of Lemma [1] constructs such an automaton 
A' ■ The monotonicity of F" with respect to F^ follows from the monotonicity of 
Ff . The soundness of F' can be stated as the following theorem. 

Theorem 4. The abstract transformer F* is sound with respect to the concrete 
transformer F^ . 

Proof. We prove the soundness of F" by showing that F'' o 7 IZ 7 o F" . Let us 
consider a QSDA A and a heap skinny-tree 7i such that H G L{A). Consider 
a statement stmt such that H gets transformed to Ti' on the execution of stmt 
i.e. F^t^tiH) = n'. We would like to prove that W G L{F^{A)). 

To prove this, consider a valuation of universal variables Y over the nodes in 
y,'. Let the corresponding symbolic tree be t' and let the data values in H' at 
the positions pointed to by Y be given by r' : y — >• D. Let us assume that the 
QSDA F'^{A) maps the symbolic tree t' to the formula ip'. Then, we would like 
to prove that r' \^ ip' ■ By arguing over all valuations over H' , this would prove 
that n' G L{F*{A)). 

To prove that r' ^ (^', fix a valuation of the universal variables Y over the 
nodes in H such that the corresponding symbolic tree t satisifes update{t, stmt, t') 
(for statements which do not modify the structure of the heap, t = t'). Let the 
data values at the positions pointed to by universal variables F in H be given by 
r. Since % G L{A), if A maps the symbolic tree t to the formula Lp then r ^ p. 
The abstract transformer F? applies the precise concrete transformer, with re- 
spect to only the data values of the heap, to the formula tp and over-approximates 
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it to obtain (^'. From the monotonicity of the concrete transformer, this imphes 
that r' 1= </?'. D 

Since -F' is both monotonic and sound, from the Knaster-Tarski theorem, 
the set of equations of the form -ip = F^ {iji) expressing the abstract semantics of 
a program admit a least fix point solution, and this solution is a sound approx- 
imation of the concrete semantics of the program. 

Note that the abstract transformer, in general, might require a powerset con- 
struction over the input QSDA, very similar to the procedure for determinizing 
a tree automaton. Hence the worst-case complexity of the abstract transformer 
is exponential in the size of the QSDA. However our experiments show that this 
worst-case is not achieved for most programs in practice. 

Theorem 5. The abstract semantics of a program, computed with respect to the 
abstract transformer F^ , is correct. 

6 Elastic Quantified Skinny- Tree Data Automata 

For a given set of pointer variables PV and universal variables Y, the QSDAs 
can be of arbitrarily large size. The number of QSDAs is not bounded and the 
computation of the abstract semantics of a program over QSDAs might not 
converge. To remedy this problem, we identify a sub-class of QSDAs called elas- 
tic quantified skinny-tree data automata (EQSDAs). Elastic QSDAs provide a 
mechanism to accelerate the fix-point computation over QSDAs. However, in- 
stead of choosing any acceleration mechanism, the elastic QSDAs were chosen 
keeping decidability of the invariants they express in mind. A key property in 
the decidable fragment of Strand is that it cannot test whether two universally 
quantified variables are a bounded distance away. We show in Section 16.11 that 
the invariants expressed by EQSDAs fall in the decidable fragment of Strand. 
So EQSDAs not only help in guaranteeing the convergence of the abstract se- 
mantics of a program, but also ensure that a program, if annotated with a set of 
assertions over logical formulas in Strand, can be proved correct by validating 
those assertions in a decidable manner. 

Let us denote the symbol (6, — ) G 77 indicating that a position does not con- 
tain any variable by b. A QSDA A = (Q, 77, A, T, /) where A ^ {Ao,Ai, . . . , Ak) 
is called elastic if each transition on b in Ai is a self loop i.e. Ai{qi,b) — q2 im- 
plies qi = q2- 

We first show that the number of EQSDAs is bounded for a fixed set PV and 
Y. Recall that heap skinny-trees accepted by QSDAs require that the number 
of branching nodes in the skinny trees are bounded. So, the only infinity in the 
size of a skinny-tree is due to an unbounded number of 6-labelled nodes which 
might occur along linear segments of the tree. If we simulate an elastic QSDA on 
a skinny-tree accepted by it, all consecutively occurring 6-labelled nodes along 
linear segments of the tree are labelled with the same state (due to the elasticity 
property). To count the maximum number of states, in an EQSDA, required 
to accept a language over heap trees, we might as well consider only those trees 
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in the language which have no fe-labehed nodes occurring along linear segments 
in the tree. For a given set PV and Y, the number of such skinny-trees and 
their sizes are bounded. This bounds the number of states in an EQSDA which 
accepts any language over heap skinny-trees. This also proves that for a given 
PV and Y, the number of EQSDAs are bounded. 

We next show the following result that every QSDA A can be uniquely over- 
approximated by a language of valuation trees (or equivalently formula trees) 
that can be accepted by an EQSDA Ad- We will refer to this construction, which 
we outline below, as elastification. This result is an extension of the unique over- 
approximation result for quantified data automata over words |12) . Using this 
result, we can show that elastic QSDAs form a finite join semi-lattice and there 
exists a Galois-connection (a'^',7'^') between QSDAs and the set of EQSDAs. 
This lets us define an abstract transfomer over the abstract domain EQSDAs 
such that the semantics of a program can be computed over EQSDAs (it ter- 
minates) in a sound manner. 

Let A = (Q, n, A, T, /) be a QSDA such that A = (Aq, Ai,...,Ak) and 
for a state q let Rb{q) '■= {q' \ q' = q or 3q".q" G Rb{q) and Ai{q",b) = q'} be 
the set of states reachable from 5 by a (possibly empty) sequence of 6-unary- 
transitions. For a set 5 C Q we let Rb{S) :— Uggs ^kil)- 

The set of states of Ac\ consists of sets of states of A that are reachable by the 
following transition function Z\^' (where Ai{Si, . . . ,Si,a) denotes the standard 
extension of the transition function of A to sets of states): 

At {a) = Rb,{Ao{a)) 

(Rb{Ai{S,a)) iia^b 
Al {S,a) = < S ii a — b and Ai {q, b) is defined for some q G S 

( undefined otherwise . 

Af{Si,...,S^,a) = Rb^{A,{Si,...,S^,a)) for i G [2,k] 

Note that this construction is similar to the usual powerset construction except 
that in each step we apply the transition function of A to the current set of states 
and take the 6-closure. However, if the input letter is 6 on a unary transition, 
^1 loops on the current set if a 6-transition is defined for some state in the set. 
It can be argued inductively, starting from the leaf states, that the type for 
all states in a set is the same. Hence we define the type of a set S as the type of 
any state in S. The final evaluation formula for a set is the least upper bound 
of the formulas for the states in the set: /"'(S*) — Uges fi'i)- We can now show 
that ^ci is the most precise over- approximation of the language of valuation 
trees accepted by QSDA A. 

Theorem 6. For every QSDA A, the EQSDA Aei satisfies Ly{A) C L^{Aei), 
and for every EQSDA B such that Ly{A) C Ly{B), Ly{Aei) C Ly{B) holds. 

A proof of Theorem [S] is presented in Appendix IB] and is similar to the proof 
of Theorem 3 in [T^] for the case of words. The above theorem can also be stated 
over a language of formula trees in the same way, that ^ei is the most precise 
over-approximation of the language of formula trees accepted by QSDA A. 



17 



Using this result, we next show that EQSDAs form a finite join semi-lattice 
(Qjr'^ , C, U, ±, T). The partial order on EQSDAs is the same as the partial order 
on QSDAs but now restricted to elastic QSDAs. For two EQSDAs Ai and A2, 
the join Ai U A2 is the unique EQSDA over-approximating Ai Ug^ A2- The 
bottom and the top elements are the EQSDAs taking every symbolic tree to the 
formulas false and true respectively. We can now view the space of EQSDAs as 
an abstraction over QSDAs. The abstraction function a^' : Qjr — > Qjr^ takes 
a QSDA A to its unique over-approximating EQSDA Ad- The concretization 
function 7'^' : Qjr'^ -^ Qjr is the identity function which maps an EQSDA to 
itself. Recall that we had already restricted Qjr to contain only those QSDAs 
which accepted different languages (over heap skinny-trees). Since Qjr'^ is a sub- 
space of Qjr, this restriction extends to it in a natural way. With this assumption 
it is easy to see that (a'^',7'^') forms a Galois-connection. 

Let us define the abstract transformer F^i : Qjr'' -^ Qjr'' — a^i o F^ o 
7ez. The soundness of F^i follows from the soundness of F'^ (and the fact that 
(a^',7'^') form a Galois-connection). Similarly its monotonicity follows from the 
monotonicity of F" and the monotonicity of a^i and jei- The semantics of a 
program can be thus computed over the abstract domain Qjr'^ as the least fix- 
point of a set of equations of the form ip = F^i{ip). Since there are a bounded 
number of EQSDAs for a given set of program variables PV and universal 
variables Y, this least fix-point computation terminates (modulo the convergence 
of the data formulas in the formula lattice J- in which case termination can be 
achieved by defining a suitable widening operator on the data formula lattice). 

Theorem 7. The abstract semantics of a program, computed with respect to the 
abstract transformer F^^, is computable and is correct. 

6.1 From EQSDAs to a Decidable Fragment of Strand 

EQSDAs introduced in the previous section can express quantified data in- 
variants over acyclic singly-linked data structures. In this section we show that 
EQSDAs have a nice property that the quantified invariants expressed by them 
fall in a decidable fragment of first order logic, in particular the decidable frag- 
ment of Strand. Hence, once the fix-point computation has converged, the 
invariants expressed by the EQSDAs can be used to validate assertions in the 
program using decision procedures. In fact, the automaton model for EQSDAs 
was chosen keeping in mind the decidability of the invariants expressed by them. 
Given an EQSDA A we would like to translate it to an equivalent for- 
mula / such that the set of heap skinny-trees accepted by A corresponds to 
the program configurations which model /. Recall that for any fc— skinny-tree 
H accepted by an EQSDA, the number of branching nodes in H is bounded 
by k. The invariants / expressed by an EQSDA are quantified formulas of 
the form 3bi . . . bk.'iyi . . .yi-ip such that, in any model satisfying /, the exis- 
tential variables B ~ {6i,...,&fe} are always instantiated with the branching 
nodes in /. The first step of the translation associates an existential variable 
bi with every state of the automaton which has more than one child (and thus 
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represents a branching point). Then we enumerate aU simple (loopless) paths 
in the automaton starting from a leaf state, say g^, to a final state, say gj, 
and record the structural constraints over these linear segments, (j)if{B, PV, Y), 
which capture the relative positions (over relations next, next+ and next*) of 
the pointer/universal and branching variables with respect to each other and the 
data formula annotated at the final state /(g/)- These structural constraints can 
be constructed as described in JJj. After consdering every pair of such states, 
the formula corresponding to an EQSDA can be expressed as 

/ = 36i . . . 6fc.Vyi . . . yi.{ f\{A,4>.,f ^ f{qf)) A (\/ A,,^,/)) 

/ / 

A key property in the decidable fragment of Strand is that universal quantifi- 
cation is not permitted to be over elements that are only a bounded distance 
away from each other. See |12] for a proof that the structural constraints (j)if are 
such that / falls in the decidable fragment of Strand. 



7 Experimental Evaluation 

We implemented the abstract domain over QSDAs and EQSDAs presented in 
this paper, and evaluated them on several list-manipulating programs. We now 
first present the implementation details followed by our experimental results. Our 
prototype implementation along with the experimental results and programs can 



be found at http://web.engr.illinois.edu/~gargll/qsdas.htinl 



Implementation details. Given a program P we compute the abstract seman- 
tics of the program over the abstract domain Qj-^ over EQSDAs. A program is 
a sequence of statements as defined by the grammar in Figure [TJ In addition to 
those statements, a program is also annotated with a pre-condition and a bunch 
of assertions. The pre-condition formulas belong to a fragment of Strand over 
lists and can express quantified properties like sortedness of lists, etc. Given 
a pre-condition formula (p, we construct the smallest EQSDA (with respect to 
the partial-order defined on the QSDAs) which accepts all the heap skinny-trees 
which satisfy (p. This EQSDA gives us an abstraction of the initial configura- 
tions of the program. Starting from these configurations we compute the abstract 
semantics of the program over Qjr"^ . The assert statements in the program are 
ignored during the fix-point computation. Once the convergence of the fix-point 
has been achieved, the EQSDAs can be converted back into decidable Strand 
formulas (as described in Section [Q]) and the Strand decision procedure can 
be used for validating the assertions. 

We recall that the abstract domain Qjr'^ is an abstraction of Qj^. So, as much 
as possible, we want to compute the abstract semantics over the more concrete 
domain out of the two, i.e. Qjr. Therefore, for every statement in the program 
we apply the abstract transformer F^ (and not the more abstract F^^). The 
intermediate semantic facts (QSDAs) in our analysis are thus not necessarily 
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elastic. However to ensure convergence of the analysis, the QSDAs at the header 
of the loops are first abstracted to elastic QSDAs using a*^' before the join. 

Our abstract domains are parameterized by a quantifier-free domain J- over 
the data formulas. In our experiments, we instantiate J- with the octagon ab- 
stract domain [55] from the Apron library [TH]. It is sufficient to capture the 
pre/post-conditions and the invariants of all our programs. 



Programs 


#PV 


#Y 


#uv 


Property 
checked 


#Iter 


Max. size 
of QSDA 


Time (s) 


INIT 


2 


1 




INIT, List 


4 


19 


0.0 


ADD-HEAD 


2 


1 




Init, List 


- 


11 


0.1 


ADD- TAIL 


3 


1 




Init, List 


4 


29 


0.1 


DELETE-HEAD 


2 


1 




Init, List 


- 


10 


0.0 


DELETE- TAIL 


4 


1 




Init, List 


5 


51 


0.5 


MAX 


2 


1 




Max, List 


4 


19 


0.1 


CLONE 


4 


1 




Init, List 


4 


44 


0.7 


FOLD-CLONE 


5 


1 




Init, List 


5 


57 


3.2 


copy-Ge5 


4 


1 





Gek, List 


9 


53 


2.6 


FOLD-SPLIT 


3 


1 




Gek, List 


4 


33 


0.3 


CONCAT 


4 


1 




Init, List 


5 


44 


0.7 


SORTED-FIND 


2 


2 




Sort. List 


5 


38 


0.3 


sorted-inseht 


4 


2 




Sort. List 


6 


163 


5.8 


BUBBLE-SORT 


4 


2 




Sort. List 


5/18 


191 


42.8 


SORTED-REVERSE 


3 


2 




Sort, List 


5 


43 


1.5 


EXPRESSOS-LOOKUP-PREV 


3 


2 




Sort. List 


6 


73 


2.2 


GSLIST-APPEND 


4 







List 


8 


3 


0.0 


GSLIST-PREPEND 


2 







List 


- 


3 


0.0 


GSLIST-LAST 


3 








Last, List 


3 


7 


0.0 


GSLIST-FREE 


3 








Empty, List 


1 


3 


0.0 


GSLIST-POSITION 


4 








List 


3 


13 


0.0 


GSLIST-REVERSE 


3 








List 


3 


5 


0.0 


GSLIST-CUSTOM-FIND 


3 


1 


1 


Gek, List 


4 


29 


0.1 


GSLIST-NTH 


3 





1 


List 


3 


7 


0.0 


GSLIST-REMOVE 


4 





1 


List 


4 


10 


0.0 


GSLIST-REMOVE-LINK 


5 








List 


4 


16 


0.0 


GSLIST-REMOVE- ALL 


5 


1 


1 


Gek, List 


5 


51 


0.6 


GSLIST-INSERT-SORTED 


5 


2 


1 


Sort, List 


6 


279 


27.4 



Table 2: Experimental results. Property checked — List: the return pointer points to 
a list; Init: the list is properly initialized with some key; Max: returned value is the 
maximum of all data values in the list; Gek: the list (or some parts of the list) have 
data values greater than or equal to a key k\ Sort: the list is sorted; Last: returned 
pointer is the last element of the list; Empty: the returned list is empty. 



Experimental results We evaluate our abstract domain on a suite of list- 
manipulating programs (see Table [5]) . For every program we report the number 
of pointer variables (PV), the number of universal variables (Y), the number 
of data variables (DV) and the property being checked for the program. We 
also report the number of iterations required for the fixed-point to converge, the 
maximum size of the intermediate QSDAs and finally the time taken, in seconds, 
to analyze the programs. 



20 



The names of the programs in Table [5] are descriptive, and we only describe 
some of them. The program COPY-Ge5 is from [6 and copies from a list only 
those entries into a new list whose data value is greater than or equal to 5. 
Similarly, the program fold-split [6j splits a list into two lists-one which has 
only those entries whose data values are greater than or equal to a key k and the 
other list with entries whose data value is less than k. The program expressOS- 
LOOKUP-PREV is a method from the module cachePage in a verified-for-security 
platform for mobile applications [32]. The module cachePage maintains a cache 
of the recently used disc pages as a priority queue based on a sorted list. This 
method returns the correct position in the cache at which a disc page could 
be inserted. The programs in the second part of the table are various methods 
adapted from the Glib list library which comes with the GTK+ toolkit and the 
Gnome desktop environment. The program GSLIST-CUSTOM-find finds the first 
node in the list with a data value greater or equal to k and GSLIST-remove-ALL 
removes all elements from the list whose data value is greater or equal to k. The 
programs GSLIST-INSERT-SORTED and SORTED-insert insert a key into a sorted 
list. 

All experiments were completed on an Intel Core 15 CPU at 2.4GHz with 
6Gb of RAM. The number of iterations is left blank for programs which do not 
have loops, bubble-sort program converges on a fix-point after 18 iterations 
of the inner loop and 5 iterations of the outer loop. The size of the intermediate 
QSDAs depends on the number of universal variables and the number of pointer 
variables and largely governs the time taken for the analysis of the programs. For 
all programs, our prototype implementation computes their abstract semantics 
in reasonable time. Moreover we manually verified that the final EQSDAs in 
all the programs were sufficient for proving them correct (this validity check for 
assertions can be mechanized in the future) . The results show that the abstract 
domain we propose in this paper is reasonably efficient and powerful enough to 
prove a large class of programs manipulating singly-linked list structures. 
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A Formal Semantics of Programs 

In this appendix we describe the concrete semantics of the primitive statements 
in our programming language defined in Figure [TJ Let us assume that there is 
a special program configuration Error to which the progam transitions to on 
encountering a memory error. 

Definition 3 (Strongest post-condition i^''). Let C = {pc,H,pval,dval) be 
a non-Error program configuration where H = {Loc, next, data). Then for 
any statement s, C is the strongest post- condition of C with respect to the 
statement s, written as C -^p C , iff C — (pc' , H' ,pval' ,dval') where H' = 
{Loc' , next' , data') and pc' is the updated program counter and one of the fol- 
lowing holds: 

— s = Pi :~ pj (or s ~ Pi :^ nil) and H' — H and dval' = dval and pval' ~ 
pval[pi/pval{pj)] (or pval' = pval[pi/ pval (nil)]). 

— s ~ Pi := Pj — ?> next and H' = LL and dval' = dval and if pval (pj) — v and 
{v,u) G next, then pval' = pval[pi/u\. Lf v = dirty or v — nil, then C is 
Error . 

— s = Pi — > next := pj (or s = pi -^ next :~ nil) and pval' = pval and 
dval' = dval and Loc' — Loc and data' — data and if pval{pi) = v 
and {v,u) G next and if pval (pj) — w (or pval(nil) = w), then next' = 
next\{{v , u)} U {{v , w)} . If v = nil or v = dirty, then C is Error. 

— s — new Pi and dval' = dval and Loc = Loc U {v},v ^ Loc, next' ~ 
next\J {(v, dirty)] and data' = data and pval' — pval[pi/v]. 

— s = di := Pi ^ data and H' ~ H and pval' = pval and if pval{pi) = v, then 
dval' = dval[di/ data{v)]. If v = nil or v = dirty then C is Error. 

— s ^ Pi ^f data := data_expr and pval' — pval and dval' = dval and Loc' = 
Loc and next' — next and if pval (pi) — v then data! = data[v/data_expr]. 
If V = nil or if V — dirty then C is Error. 

— s = skip and C = C. 

— s — assume (ipstruct) and C = C and C [= instruct- 

— s = assume (ipdata) and C = C and C \= ipdata- 

B Proof of Theorem [6] 

Note that Ad is elastic by definition of Af. It is also clear that L^{A) C Lv(.4ei) 
because for each run of A using states qo- ■ • Qn the run of Ad on the same input 
uses sets Sq- ■ ■ Sn such that qi G Si, and by definition /((?„) implies /'''(5'„). 

Now let B be an EQSDA with Lv(-4) C Ly{B). Let i be a valuation tree 
accepted by ^ci and let S be the state of ^ei reached on reading t. We want to 
show that t G Lv{B). Let p be the state reached in B on t. We show that f{q) 
implies feip) for each q & S. From this we obtain f^^{S) => /b(p) because f°^{S) 
is the least formula that is implied by all the f{q) for q E S. 

Pick some state q & S. By definition of Z\''' we can construct a valuation tree 
t' G Ly{A) that leads to the state g in ^ and has the following property: if all 
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letters of the form (b, d) and those that have a single child are removed from t 
and from t' , then the two remaining trees have the same symbolic trees. In other 
words, t and f can be obtained from each other by inserting and/or removing 
6-letters. 

Since B is elastic, t' also leads to p in B. From this we can conclude that 
f(q) ^ f(p) because otherwise there would be a model of f{q) that is not 
a model of f{p) and by changing the data values in t' accordingly we could 
produce an input that is accepted by A and not by B. D 



C Construction of the Strengthen Operator 

Given a QSDA A, we give below a high level sketch of how to construct the QSDA 
A' accepting the following language of formula words: (Strengthen) y Lf{A). 
The construction of the QSDA A' takes place in two steps. 

In the first step, we construct a QSDA Ai which accepts (Sx iYU{ — }\{y}))- 
labelled formula trees of the form {(i' [y, (p) \ <f> : \~\{3d.f[y — > data/d] | (t, (p) G 

Lf{A),tiy=t' iy}}. 

And in the second step, we take the cross-product of this automaton Ai 
with the initial automaton A to get the QSDA A' such that if the symbolic tree 
t' [y is mapped by Ai to data-formula 4> and the symbolic tree t' is mapped 
by A to the data- formula ip" , then t' is mapped in the new automata A' to 
the data- formula (jiFlip". The required cross-product of two QSDAs is similar to 
the algorithm for computing intersection of tree automata. The automata which 
computes the cross-product simulates the transitions of both the automata A 
and Ai- However, since Ai accepts trees which do not have the variable y labeling 
them, the cross-product automata on a label n which contains y simulates the 
transitions of A on the label w, but simulates the transition of Ai on the label 
7i"\{j/}. The states in the cross-product automaton are labeled with the meet of 
the data-formulas labeling the states of the two individual automata. 

So now let us describe the construction of the automaton Ai- The QSDA 
Ai accepts symbolic trees of the form t [y if {t,ip) G Lf{A). Hence, Ai is an 
automaton which simulates the transitions of ^ on a symbolic tree, except when 
it reads a node in the tree labeled with label tt which contains variable y. When 
this happens, Ai simulates the transitions of A on the label 7r\{j/}. The data- 
formulas mapped to each state in Ai is obtained by existentially quantifying 
out y — > data from the data- formulas mapping the corresponding states in 
A. Note that the transition relation of Ai we just described might be non- 
deterministic; the same symbolic tree t [y might be mapped to more than one 
data-formula eg. 3d.{px[y — > data/d] and 3d.ip2[y — > data/d] and so on. We 
want the automata Ai to map t [y to the meet of all these formulas. This can be 
achieved by determinizing the transition relation of the QSDA, very similar to 
the determinization procedure of a tree automata. In addition, for a set of states 
in the deterministic automata, we label it with the meet of the data-formulas 
labeling each state in the set in the original non-deterministic automata. In this 
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way, the transition relation could be determinized to obtain QSDA ^i. This 
completes the construction of A'. 

D Construction of the Abstract Transformer 

First let us introduce some preliminary notation. For a set S, let S^ denote the 
i*'' cartesian power of S. For an n-tuple x — {xi, . . . , x„) ^ X = Xi x . . . x Xn, 
let us define x ii for 1 < z < n as the projection onto the i*'* component of the 
tuple i.e. (xi, . . . ,x„) li— Xi. When the set Xi can be uniquely distinguished 
from all other components Xj of the n-tuple X , x ]^Xi — Xi is also used to denote 
the Xl^ component of the tuple x. For a tuple x, x' = x[Xi/xi, . . . ,Xk/xk] is 
used to denote the tuple which is same as x except at the Xi , . . . , Xk components 
where the tuple x' takes the value xi, . . . ,Xk respectively. 

For a function F : D ^ X, where X — Xi x . . . x X„, which maps a domain 
D to an n-tuple X , \ct F li: D ^ Xi ioi 1 < i < n he defined such that 
F li {d) — F{d) li for aU d e D. Also for a function /, let us denote / |d as 
the function which is same as / but with its domain restricted to the set D. for 
a given function /, let us also define /' ~ f[di/vi, . . . ,dk/vk] as the function 
which is same as / except at the domain elements di, . . . , dk where /' takes the 
value vi, . . . , Vk- 

We now present the construction of the abstract transformer for each case of 
the statement stmt. Let the input QSDA be of the form {Q, U, A, T, /) where 
n = {E X {Y U {— })) and S — 2^^ . Let us view Z* as a set of boolean vec- 
tors where the i*'' bit in a vector indicates whether the pointer pi belongs to 
the vector or not. The output QSDA after the execution of stmt is of the form 
(Q', 77, Z\',r',/') where: 

Case 1 {s = Pi := pj): The evaluation function /' of the automaton depends on 
whether pj co-occurs with nil or not. To facilitate this, we split each state into 
two states Q' — QL){{q,nil) \ q G Q)}. Regarding the transition relation, for all 
transitions A{qi, .., Qp, n) = q, 

- if Pj ^T{q), A'{qi,..,qp,TT[pJO]) = q. 

- a Pj E T{q) and n Ip^— 1 and n ip„i,= 1, then A'{qi,..,qp, 7r[pi/l]) = 
{q,nil)). 

- a Pj E T{q) and tt |p^= 1 and tt ip„,, = 0, then A'{qi, ..,qp, Tr[pi/1]) — q. 

- if Pj E T{q) and tt \.p. — then there exists a state qj, 1 < j < p such that 
Pj E T{qj). Correspondingly, we add transitions A'{qi, ...,qj, ...,qp, T:[pi/0]) 
= q and A'{qi, ...,{qj,nil), ...,qp, 7r[pi/0]) = {q,nil). 

The type T' for every state E Q' is: 

(T{q)U{p,} xipjEr{q),p^iT{q) 
r{q) = T'{q,ml) ^ I r{q)\{p^} Up, i r{q),p, E Uq) 

I T{q) otherwise 

The evaluation function depends on whether the pointer pj is nil or not. Hence, 
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f'{q) = 3d.f{q)[pi — > da.ta./d]r\(pi — > data = pj -^ data). Otherwise, f'{q, nil) = 
3d.f{q)[p, -^ data/d]. 

Case 2 (s = p^ := p,n -^ next): Firstly, Q' = {q \ q E Q,Pm i T(q)} U 
{{q,*),{q,v) I q G Q,Pm £ T{q),v E PVLiYU{ — }}. The automaton transi- 
tions to a state of the form (g, *) on reading the pointer variable Pm- This is 
like guessing the state in which the automaton transitions to, on reading vari- 
able Pm- After reading p^, the automaton reads variable pi and transitions from 
state {q, *) to a state of the form (q' , v) where v is basically a pointer variable or 
a universal variable which is co-read with pi. The variable v is used later when 
defining the evaluation functions for states Q' . 

More formally, for all transitions A{qi, ..., qp, n) = q, 

- if Pm ^ T{q) then A'{qi, ...,qp,'!r[pi/0]) = q. 

- iipm E T{q) and 7r|p^= 1 then A'{qi,...,qp ,7r[pi/0]) == (g, *)• 

- a Pm E T{q) and tt ip„= then there exists a state q^ E {qi, ■■■,qp} such 
that prn E T(qm)- Accordingly we add the transition, A'{qi, ..., {qm, *), •••, qp, 
TT[pi/V\) — {q, v) if there exists a variable v E PV U Y such that tt 4,^= 1 and 
TT |m(= 0. Otherwise the transition Z\'((ji, ..., (q™, *), ...,(jp,7r[pi/l]) = (q, -). 
This covers the case when state qm was reached immediately after reading 
variable pm- For the other case, we add transitions A\qi , ..., {qm,v) , ...,(7p 
,Tr[p,/0]) = (<7,w) for anwePFurul-}. 

Note that the final evaluation formula is only associated with states of the 
form {q,v) or (q,—)- For all g G Q such that Pm E T{q) and v E PV U Y, 
f'{q,v) = 3d.f{q)[pi -^ data/d] n {pi — > data = v ^ data). Otherwise, 
f'{q,—) = 3d.f{q)[pi — !> data/d]. For all other states, the evaluation formula 
is false. 

Finally, for all q' E Q' , the type associated with q' is given as: 
(riq')\{p,} iiq'EQ,Prniriq') 
r'{q') = lT{q)\{p.} iiq'^{q,*),qEQ 

[r{q) U {k} if q' ^{q,v),qEQ,vEPVLiYU {-} 

Case 3 (s — new pi): The statement s allocates a new node which is pointed 
to by variable pi and is added as a child to the root of data trees accepted by 
the original automaton. The universal variables, apart from the nodes already 
present in the data trees, now also have to valuate over the newly allocated 
node. The set of states of the new automaton is Q' ~ {Q U {q}) x Y where 
q ^ Q. The states of the form {Q U {q},y) are used to accept valuation trees 
where univeral variable y valuates over the newly allocated node whereas the 
other universal variables valuate over the existing nodes present in the heap 
tree. The states {Q U {q}, —) are used to accept valuation trees where none of 
the universal variable valuates to the newly allocated node. The state q is used 
to transition the automaton to a special state on reading the new node labeled 
with pointer variable pi i.e. A'{{{pi},y)) = {q,y) for all y E Y. Also for all 
transitions A{qi, ...,gp,7r) = g. 
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Hit lroot=0, A'({qi,y),...,{qp,y),TT[p^/0,y/0]) = {q,y) for ally £ Y. 

if TT |root= 1, A'((qi,y),...,{qp,y),{q,y),TT[pjO,y/0]) ^ {q,y) for all y EY. 



f'iq,y) 



The final evaluation formula is given as: 

3did2-,f{q)[pi — > data/di, y — > d.ata/(i2] H (p^ — !> data = y ^f data) 

if (7 e Q,?} = J/ e F 

3d.f{q)[p., -^ data/d] if <? G Q, ?} = - 

/aZse if <7 = (7 

Also the types for each state in the new automaton are: 

{pt} '^iq = q,y = - 

{pt,y} '^iq = q,y = y^y 

r{q)\{p^} iiqeQ, root i T{q),y = - 

T{q)\{p^, y] if g e g, root i T{ci),y = yeY 
T{q) if <7 G g, root E T{q) 



T'iq.y) 



Case 4 (s = Pm ^ next := pi): Firstly, Q' = Q U (Q, *) U {{qi,q2) \ <?i,'?2 G 
QjPi G T{qi) iff Pm ^ '^('/i)}- From the semantics of the strongest-post, we 
know that the new automaton removes any sub-tree rooted at pm and attaches 
it as an additional child to a node labelled with variable pi. States of the form 
(Q, *) are special states in which the automaton transitions to on reading the 
variable Pm- If ('Z2j*) accepts a tree t^ rooted at pm then the state {qi,q2) 
where pi G T^{qi) accepts a tree which has t,„ as an additional child to an in- 
ternal node labelled with pi. On the other hand, if Pm £ T{qi) then (51,(72) 
accepts a tree which had its subtree Tm, rooted at pm, removed. Describing the 
transition relation in detail, for a transition A{qi, ...,gp,7r) = q: 

1. if {pi-,P'm\^T{q) = (/>, then we add the same transition to the new automaton 
i.e. Z\'((ji,...,qp,7r) ^ q. 

2. iipni E T{q) and pi ^ T{q) 

— and if tt ]^p^ = 1 then the automaton should transition to a state of the 
form (Q,*); therefore A'{qi, ...^qp^n) — (g, *). 

— otherwise, there exists a state qm E {qi, •.., <Zp} such that pm E T{qm)- In 
case qrm in the original automaton, accepted trees rooted at p„n the new 
automaton should remove q^n from the left hand side of the transition 
and should transition to a state {q,qm) via A'{qi, .., qm-i, qm+i, ■■■, 
qp, tt) = (q, qm)- To handle the other case, where qm accepted trees which 
were not rooted at pm, the transitions A'{qi, ..., {qm,, q), ■■■, qp, '^) = (qiq) 
are added for all g G Q. 

3. if Pi E T{q) and pm ^ T{q) 

— and if tt ]^p^ = 1 then the new automaton should accept a tree at q which 
has an additional child Tm rooted at pm- Since all trees rooted at Pm 
are accepted at states of the form (Q, *) (the first subcase of 2 above), 
transition A'ijii, ..., qp, {qm, *), tt) — {q, qm) is added for all qm £ Q- 
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— otherwise, there exists a state qi G {qi, ...,<7p} such that pi G T{qi) and 
the fact, that any node labelled with pi in the tree accepted at q has as 
an additional child a tree Tm rooted at Pm , is propagated recursively via 
the transitions A'{qi, ...,{qi,qm), ■■■,qp,T:) = {q,qm) for all qm G Q. 

4. ii{P^,Pn^}Qr{q) 

— and n ],p^ = 1 (reagrdless of the value of tt ],p. ) no transition is added 
to Z\', as any heap configuration accepted by the original automaton via 
this transition leads to a cycle on the execution of statement stmt. 

— otherwise if tt s|,p„= and tt ip;= 1, there will exist a state qm G 
{qi, ■■■,qp} such that pm G T{qm)- The corresponding state in the new 
transition will be {qm,q) if q was the state of the original automaton 
which accepted the internal subtree Tm rooted at pm- Since tt ].p-— 1, 
an additional state {q, *) is added to the left hand side of the transition 
to ensure that the new automaton accepts the tree which has Tm as an 
additional child to a node labelled with pi. Formally, A'{qi, ...,{qm,q), 
..., gp,(g, *),7r) = q for all q eQ. 

— otherwise if tt 4,p„= tt ipi= and there exists a state qim £ {^i, ■■■,qp} 
such that {pi,Pm} '!= T{qim) then the transition remains unchanged i.e. 
A'{qi, ..., q^m, •••, qp,TT) = q- 

— otherwise if n ip^— tt Ip.^ and there exist states qi,qm £ {^ij •••j Qp} 
such that Pi G T{qi) and pm G T{qm), then the transition A'{qi^ ..., 
{qi,q), ..., {qm,q), •••, qp^Tr) — q is added for all q^Q. Note that {qi,q) 
accepts a tree which has an additional child (accepted at (17,*)) at a 
node labelled with pi (explained in case 3 above) and {qm,q) accepts a 
tree where the internal subtree rooted at pm and accepted at (q, *) has 
been removed (explained in the second subcase of 2 above). Note that if 
qm = q then the state {qm,q) is removed from the left hand side of the 
transition A' i.e. A'{qi, ..., {qi,q), ..., gp,7r) = q. 

The final evaluation formula is unchanged for the states Q C Q', it is false 
for the newly added states i.e. f'{q) — f{q),q G Q and f'{q,*) — f'{qi,q2) — 
false. The type T' for the new automaton is defined as: 

'T{q) if^GQ 

^,,.^,T{q) iiq = {q,*),qeQ 

^'^' V(gi)ur(g2) iiq = {quq2),P^er{ql) 
.T{qi)\T{q2) iiq = {qi,q2),Pm e T{qi) 

Case 5 (s — Pm ~^ data :— a): On execution of this statement, the struc- 
ture component of the data trees accepted by the automaton is unchanged; 
however the final evaluation function has to now record the fact that the value 
of the data pointed by variable Pm is assigned the value of a. If, for a par- 
ticular valuation tree, Pm is co-read with variable v G PV U Y before it is 
accepted at state q, f'{q) should also record that the data value pointed by 
variable v is now assigned to a. So the new automaton needs to track the set 
of variables which are co-read with Pm for a particular valuation tree. Hence 
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Q' = {q\q e Q,Pm i T{q)} U {{q,S) I q G Q,p^ G T{q),S C PV U Y}. 
Regarding the transition relation, for all transitions A{qi, ...,gp,7r) — q, 

- iiPni i T{q), A'{qi,...,qp,TT) ^ q. 

— otherwise if Pm G T{q) 

• and if tt ip^— 1 then A'{qi, ..., qp, tt) — (g, S) where Vs G 5. tt Is— 1- 

• otherwise if tt ip^= then there exists a state qm G {qi, ..., qp} such that 
Pm G T{qm)- Consequently, we add transitions A'{qi, ..., {q-m, 5*), •••, (Zp, tt) 
(<7, S) for aU S* C PF U F. 

The final evaluation function /' is given as: f'{q) — f{q) for all g G Q fl Q'; 
f'(q^ S) — 3d.f{q)[vi — > data/d, ...,«£—> data/(i]n(wi -^ data = a)n. . .n(w£ ^• 
data = a) where S — {vi, ...,ve} and includes variable Pm- The type for each 
state of the new automaton is also same as the type in the original automaton 
i.e. T'iq) = T{q) for aU g G Q n Q'; while Tiq,S) = T{q) for aU S in the re- 
maining states. 

Case 6 (s = assume (pi = Pj) )'■ The output QSDA is obtained by removing from 
the input QSDA, transitions where variables pi and pj do not occur together. 
Formally, Q' = Q, f = f,T' = T and for all transitions A{qi, ...,qp,TT) = g in 
the input QSDA, A'{qi, ...,qp,TT) ~ q iS tt lp^= n Ip.. 

Case 7 (s — assume (pi ^^ Pj) )'■ The output QSDA is obtained by remov- 
ing from the input QSDA, transitions where variables pi and pj occur together. 
Formally, Q' = Q, f = /, T' = T and for all transitions A'{qi, ..., qp, tt) = g in 
the input QSDA, Z\'(gi, ..., q^, tt) = q iff tt ip;= or tt \rpj— or both. 

Case 8 (s = assume 4'data )'■ In this case, Q' = Q and the transitions A' is 
same as A. The type T' — T and for all q £ Q, f'{q) = f{q) n tpdata- 

The transition relation A' thus constructed might need to be determinized to ob- 
tain A' . For a symbolic tree t and formulas ipi, . . . ,ipj such that (i, (/9i), . . . , (i, ipj) 
belong to the language, the determinization procedure maps t to the formula 
(fi U . . . U ipj. The determinization procedure is similar to the powerset con- 
struction for determinizing tree automata; except for any set of states, the final 
evaluation function is now assigned to be the join of the formulas being mapped 
to the individual states in the set. 
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